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ABSTRACT 

Incorporating the side information of text corpus, i.e., au¬ 
thors, time stamps, and emotional tags, into the traditional 
text mining models has gained significant interests in the 
area of information retrieval, statistical natural language 
processing, and machine learning. One branch of these works 
is the so-called Author Topic Model (ATM), which incor¬ 
porates the authors’s interests as side information into the 
classical topic model. However, the existing ATM needs to 
predefine the number of topics, which is difficult and inap¬ 
propriate in many real-world settings. In this paper, we pro¬ 
pose an Infinite Author Topic (IAT) model to resolve this 
issue. Instead of assigning a discrete probability on fixed 
number of topics, we use a stochastic process to determine 
the number of topics from the data itself. To be specific, we 
extend a gamma-negative binomial process to three levels in 
order to capture the author-document-keyword hierarchical 
structure. Furthermore, each document is assigned a mixed 
gamma process that accounts for the multi-author’s contri¬ 
bution towards this document. An efficient Gibbs sampling 
inference algorithm with each conditional distribution being 
closed-form is developed for the IAT model. Experiments 
on several real-world datasets show the capabilities of our 
IAT model to learn the hidden topics, authors’ interests on 
these topics and the number of topics simultaneously. 

Categories and Subject Descriptors 

H.2.8 [Database applications]: Data mining; 1.2.6 [Artificial 
Intelligence]: Learning 
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1. INTRODUCTION 

Traditional text mining algorithms only model the text 
corpus with two levels: document-word. Topic models are 
commonly regarded as the efficient tools for the text mining 
by learning the hidden topics m- Recently, interests have 
been paid on the side information of the text corpus, which 
includes the conferences of the papers [21], time stamps [24] , 
authors [151 1 20] , entities [9], emotion tags |T] and other la¬ 
bels [28]. The incorporation of these side information into 
the classical topic models benefits a lot of real-world tasks. 
Among them, Author Topic Model (ATM) [18 1 i20l ll7j is 
proposed by adding a set of variables to the original topic 
model aiming to indicate and inference the interests of au¬ 
thors together with the hidden topics. 

The ability to jointly learn the hidden topics and authors’ 
interests on these topics has a variety of application sce¬ 
narios. For example, 1) an academic recommendation sys¬ 
tem can recommend authors and/or papers with similar re¬ 
search interests to that of the input author; 2) detecting the 
most and least surprising papers for an author [20]; 3) in 
an author-topic-based paper browser, a set of papers can be 
ranked according to authors and topics; 4) authors disam¬ 
biguation [26]. 

One drawback of the existing author topic model is that 
the number of hidden topics needs to be fixed in advance. 
This number is normally chosen with domain knowledge. By 
fixing the number of topics, ATM can then adopt Diriclilet 
and Multinomial distributions with the pre-defined dimen¬ 
sion. However, limiting each document to have exactly fixed 
number of topics is apparently unrealistic for many real- 
world applications. In this paper, we propose an infinite 
author topic (IAT) model to relax this assumption. Instead 
of using fixed-dimensional distributions, stochastic processes 
are used: to be specific, the gamma-negative binomial pro¬ 
cess m is extended to three levels for capturing the hierar¬ 
chical structure: author-document-keyword. In this model, 


each document is assigned with a gamma process to express 
the interest of this document on the hidden topics instead of 
a vector with a fixed dimension. This gamma process can be 
simply considered as an infinite discrete distribution, and is 
parameterized by a base measure (another gamma process) 
that denotes the interest of the author of this document on 
the hidden topics. However, a document normally has mul¬ 
tiple authors, so we assign a document a mixed gamma pro¬ 
cess that is based on all the gamma processes of the authors 
of this document. Furthermore, introducing mixed gamma 
process will lead to intricacies in terms of model inference. 
Therefore, an efficient Gibbs sampling with closed-form con¬ 
ditional distributions is developed for the proposed model. 
Experiments on the two real-world datasets show the capa¬ 
bility of our model to learn both the hidden topics and the 
number of topics, simultaneously. 

The main contributions of this paper are, 

1. propose a new nonparametric Bayesian model to relax 
the fixed topic number assumption of the traditional 
author topic models; 

2. design an efficient Gibbs sampling inference algorithm 
for getting the solution of the proposed model. 

The rest paper is structured as follows. Section 2 briefly 
introduces the related work. Section 3 describes some pre¬ 
liminary knowledge. The IAT model is proposed and pre¬ 
sented in Section 4 with its Gibbs sampling inference algo¬ 
rithm. Section 5 describes the IAT model experimental re¬ 
sults using real-world datasets. Finally, Section 6 concludes 
this study with a discussion on future directions. 

2. RELATED WORK 

In this section, we briefly review the related work of this 
study. The first part is about the topic models, and the 
second part is about nonparametric Bayesian learning. 

2.1 Topic Models 

Topic models [2j are Bayesian models with fixed-dimensional 
probability distributions. They are originally designed for 
the text mining task, which aim to discover the hidden top¬ 
ics in the text corpus to assist document clustering or clas¬ 
sification. Due to their good extendibility and powerful rep¬ 
resentation, they have been successfully applied to many 
research areas, including analysis in image |T2], video [10] , 
genetics [5] and music m ■ Among these extensions, author 
topic models □ 3(231111 were proposed to infer the hidden 
topics and author interests. The documents are supposed to 
be generated by its authors according to their interests over 
the hidden topics. This model will be explained with more 
details in Section 3. 

ATM has attracted a lot of attentions from researchers 
working in the text mining area, because it provides an el¬ 
egant way to incorporate the side (in this case, author) in¬ 
formation of the documents for topic learning. This model 
can be extend to incorporate other side information of text 
corpus, such as emotional tags pQ, conferences[211 and time 
stamps [24] , 

2.2 Nonparametric Bayesian Learning 

Nonparametric Bayesian learning is a key approach for 
learning the number of mixtures in a mixture model (also 
called model selection problem). Without predefining the 



number of mixtures, this number is supposed to be inferred 
from the data, i.e., let the data speak. 

The idea of nonparametric Bayesian learning is to use the 
stochastic processes to replace the traditional fixed-dimensional 
probability distributions, such as Multinomial, Poisson, and 
Dirichlet. In order to avoid the limitation associated with 
fixed dimensions, Multinomial Process (MP), Poisson Pro¬ 
cess (PP) [8] and Dirichlet Process (DP) [22] are used to 
replace former distributions because of their infinite proper¬ 
ties. 

The merit of these stochastic processes is that they let 
the data to determine the number of factors (in text min¬ 
ing task, topics). DP is a good alternative for the models 
with Dirichlet distribution as the prior. Many probabilistic 
models with fixed dimensions have been extended to the infi¬ 
nite ones by the help of stochastic processes: Gaussian Mix¬ 
ture Model (GMM) is extended to Infinite Gaussian Mix¬ 
ture Model (IGMM) [16] using DP; Hidden Markov Model 
is extended with infinite number of hidden states using Hi- 
erarchial Dirichlet Process [23I3- Through the posterior 
inference (i.e., Markov chain Monte Carlo (MCMC) ill]), 
the number of the mixtures can be inferred. Other popular 
processes include beta process, gamma process, poisson pro¬ 
cess, multinomial process, negative binomial process (NBP) 
[27] [3] have also been used in the machine learning commu¬ 
nities recently. 

To summarize, nonparametric Bayesian learning [4] has 
been successfully used to extend many finite models and ap¬ 
plied to many real-world applications. However, to the best 
of our knowledge, there has not been any works proposed to 
use NBP for author topic modelling. This paper addresses 
this shortcoming by proposing a mixed gamma negative bi¬ 
nomial process to extend the finite author topic model to 
the infinite one. 

3. PRELIMINARY KNOWLEDGE 

This section briefly introduces the related models which 
will be used in the rest of sections. 

3.1 Author Topic Model 

The Author Topic Model 1X8: 1201 117] aims to learn the 
hidden topics from the papers and more importantly learn 
the authors’ interests on these topics. Based on the classical 
LDA [2j, a set of new variables are introduced to indicate 
the authors’ interests. The graphical representation of the 
model is shown in Fig. |T] and the generative procedure is 










as follows, 


p a Dirichlet(a) 

<j>k l 'N i Dirichlet(l3) 

Xd,n ~ Unif (aj.) W 

Zd,n ~ Discrete(px d n ) 
w d ,n ~ Discrete(cp Zd n ) 

where {p a }a=i denote the authors’ interests on the top¬ 
ics and ad denotes the authors of a document. We can 
see from the Eq. © that the ATM is constructed by the 
fixed-dimensional probability distributions. One issue of this 
model is that the number of topics needs to be predefined, 
because the dimensions of the probability distributions need 
to be predefined. However, it is very difficult and not ap¬ 
propriate to predefine the topic number in many real-world 
scenarios. 

3.2 Gamma Negative Binomial Process 


3.2.1 Gamma Process 

A gamma process GaP(a, H) [T9] is a stochastic process, 
where H is a base (shape) measure and a is the concentra¬ 
tion (scale) parameter. It also corresponds to a complete 
random measure. Let T = {(-Ki, be a random real¬ 

ization of a Gamma process in the product space R + x 0. 
Then, we have 


T ~ GaP(a, H) 

oo 

= ^2 7Ti 5g i 

i=l 


( 2 ) 


where S(-) is an indicator function, ni satisfies an improper 
gamma distribution gamma(0,a), and 6i ~ H. After the 
normalization of the {7r}, we can get the famous Diriclilet 
process [22] . 


3.2.2 Negative Binomial Process 
A negative binomial process NBP(p , Go) [2Z] is also a 
stochastic process parameterized by a base measure Go and 
p. Similar with the gamma process, a realization of negative 
binomial process X = {(ni,f?;)}£i is also a set of points in 
product space Z + x 0. Then, we have 


X ~NBP(p, Go) 

OO 

= ^2 n i S Oi 
i= 1 


(3) 


where {rii} are integers, so negative binomial process is nor¬ 
mally used for the counting model [3]. Compared with Pois¬ 
son process which is also suitable for the counting model, 
negative binomial process has a better variance-to-mean ra¬ 
tio (VMR) and the overdispersion level [27] . 


3.2.3 Gamma-Negative Binomial Process 

Normally, negative binomial process is used as the likeli¬ 
hood part of a Bayesian model. Like a negative binomial 
distribution x ~ NB{r,p) which has two parameters: r > 0 
and p £ [0,1], there are two kinds of priors for a negative bi¬ 
nomial process: one is Gamma process m as shown in Eq. 
©; the other is the Beta process [3] as X ~ NBP(B,r). 
In this paper, we use the Gamma process prior. A gamma¬ 
negative binomial process-based topic model is proposed in 



Figure 2: Gamma-Negative binomial process topic model 
Table 1: Notations used in this paper 


Notation 

description 

D 

number of documents 

A 

number of authors 

N 

number of words 

AD 

author-document mapping matrix 

DN 

document-word mapping matrix 

Ad, 

number of authors of document d 


m as shown in Fig. [2] and it can be represented as, 
r - GaP{c 0 ,H) 

X d ~ NBP(p d ,T) U 

where the base measure of the negative binomial process 
r is a random measure from a gamma process. Xd is for 
each document, and this hierarchial form makes the docu¬ 
ments share a same base measure T. This gamma-negative 
binomial process can be equivalently augmented as gamma- 
gamma-poisson process, 

T - GaP(co, H) 

r d ~GaP^i^,rj (5) 

Xd ~ PP(r d ) 

where Xd ~ PP(r d ) is a Poisson process with parameter T d . 
This augmentation, which is useful for the close-form model 
inference algorithm design, is equal to gamma-negative bi¬ 
nomial process model in distribution. 

In this paper, we will build an infinite author topic model 
based on this gamma-negative binomial process model. 

4. INFINITE AUTHOR TOPIC MODEL 

In this section, we first propose our infinite author topic 
(IAT) model, and then introduce its Gibbs sampling strategy 
to inference the proposed model. 


























Figure 3: Gamma-Gamma-Negative Binomial Process 
Model (3GNB) (left one) and Infinite Author Topic Model 
(IAT) (right one) 


4.1 Model Description 

Consider the gamma-negative binomial process topic model 
in Eqs. © and © again: despite its successful, this model 
however is fundamentally the same as the basic topic models, 
which are used for modeling the data of two level hierarchy: 
document-keyword. Our aim is to extend topic model into 
three-level hierarchy: author-document-keyword. So we add 
another gamma process level to capture the additional (au¬ 
thor) level based on the gamma-negative binomial process 
topic model in Eq. m analogues to the hierarchical form of 
Hieratical Dirichlet Process [23) . 

Fo ~ GaP(c 0 ,H ) 

F a ~ GaP(c a ,r 0 ) 

, (6) 
r d ~ GaP((i - p d )/p d ,ri) 

X d ~ PP(T d ) 

where F a is the new added level for the authors. We call this 
model three-level gamma-negative binomial process topic 
model (3GNB), which is graphically shown in the left sub- 
figure of Fig. [3] However, there is a problem in 3GNB that 
it requires each document with only one author. 

In the 3GNB model, each document is assigned a realiza¬ 
tion of gamma process, 

OO 

T d = ^2n d ,kSg k (7) 

k=1 

where 9 k denotes the fcth topic and iy d ^ is the weight of fcth 


topic. {^d,k}kLi can be viewed as the interest of document 
d on the topics. The number of topics can potentially be 
infinite and therefore justifies the infinity in the summation. 
However, since the data is limited, the learned topics will be 
also limited. Similar to the document, each author is also 
assigned a realization of gamma process, 

OO 

r “ = ^2'K atk 8e k ( 8 ) 

k=1 

where {7r a ,k}fcLi is the weight of interests of author a on 
the topics. In the 3GNB model, the base measure for a 
Fd is from its author r„. It can be seen as the ‘interest 
inheritance’. 

In order to model in the setting where a document is with 
multiple authors, we combine all the gamma processes of 
every authors of a document together by 

rf = r 01 ® r a2 ® • • • ® r aAd (9) 

where A d is the number of authors of document d, © is 
the convex combination (each gamma process is with same 
weight in this paper) and rf is the mixed prior for r<j. We 
can see the mixed gamma process r„ as the mixed interests 
of all the authors of a document. Then, the revised model 
is as follow 

r 0 ~GaP{co,H) 
r a - GaP(c a ,T 0 ) 

p^ — P cn p m ... ^ P 

-L a — 4- ai vI7 4 a 2 v 37 -L a ^4^ 

T d ~GaP{{l- Pd )/p d ,T d a ) 

X d ~ PP(V d ) 

and the graphical representation is shown in Fig. [3] Some 
frequently used notations are explained in Table [T] 

4.2 Model Inference 

It is difficult to perform posterior inference under infinite 
mixtures, a common work-around solution in nonparametric 
Bayesian learning is to use a truncation method. Truncation 
method is widely accepted, which uses a relatively big K 
as the (potential) maximum number of topics. Under the 
truncation, the model can be expressed below as a good 
approximation to the infinite model, 


7o 

~ Gamma(eo, 1/fo) 

ro,k 7o, Co 

~ Gamma ( 70 / K, 1/co) 

r a ,k\ro,c a 

~ Gamma(r 0 ,k, 1/ c a ) 

Pd 

~ beta(ao,bo) 

d 

^ a,k 

— ^a k ,k © r a2 ,/c © * * * 

T d ,k |r*a, Pd 

~ Gamma(rt t k,Pd/{l ~ Pd)) 

n d ,k 

~ Pois{r dtk ) 


K 

N d 

— ''y ' a d ,k 


k=1 

6i-.k 

~ —H 


7o 

%d,n 

~ Multi(r d ,i/ ^2 r d , r d}2 /^ r d , r d}3 /^ r d 

1Vd,n 



where 7 o = / dH is the total mass of measure H, and the 
parameters are given the appropriate priors. Here, H is a 




















TV-dimensional Dirichlet distribution, and each 6 is a topic 
that is a TV-dimensional vector. 

The difficult part of the inference for this model is the 
mixed part Tf or rf. Since rf = r ai © r a2 © • • • is the 
mixed value, it is hard to infer the posterior of r a through its 
likelihood. In order to resolve this issue, we firstly introduce 
the Additive Property of the negative binomial distribution, 

Theorem 1. If Xi follows a negative binomial distribu¬ 
tion with parameters ri and p and if the various Xi are inde¬ 
pendent, then Xi follows a negative binomial distribution 
with parameters Y2 r i and V- 


In the model, we have 

rd,k\{r a },Pd ~ Gamma(rZ k ,pd/( 1 ~ Pd)) 


( 10 ) 


nd,k ~ Pois(r d ,k) 

(in distribution) equal to 

n d ,k ~ NB(ri' k ,p d ) (11) 

and according to THEOREM 1, it is further (in distribution) 
equal to 


n d , k ~NB[ ^, Pd 


'ft'd.k — 




( 12 ) 


where Ad is the number of authors in document d. 

We have split nd,k the number of words assigned to topic 
k in document d into a number Ad of independent variables 
{n d j.}. Here, n d k denotes the number of words assigned to 
topic k from author a in document d. From Eq. m , we can 
see that we have the likelihood part of the r a , so we can 
update/inference the r a using n d . Introducing the auxiliary 
variables {n^ fc } helps us resolve the difficult inference prob¬ 
lem brought by the mixed gamma process. Note that the 
independence between the elements of {n d fc } is very impor¬ 
tant, which facilitates us update each n d k independently. 

According to the relationship between the negative bino¬ 
mial distribution and the gamma-poisson distribution, for 
each n d k , we have: 


^d,k 


■ NB(^,p d ) 

(Ta.k 


=► r a d ,k ~ Gamma(-^- 1 p d /{ 1 - p d )), n d , k ~ Pois(r d , k ) 

(13) 

We want to highlight that r dk is different from rf *. 

is the mixed Gamma process of multiple author Gamma 
processes T a of Gamma process Td of document d and r dk 
is the interest of document d on topic k inherited from author 
a. 

Due to the non-conjugacy of gamma distribution and neg¬ 
ative binomial distribution, it is difficult to update r a with 
a gamma prior. In order to make the inference with only 
close-formed conditional distributions, we use the following 
results on the negative binomial process, 

Theorem 2. \14\ \2T\ f If X follows a negative binomial 
distribution X ~ NB(r,p) with parameters r and p, then X 
can also be generated from a compound poisson distribution 


X = ^2 u t,t *~ d Log(p), l ~ poiss (—rln( 1 — p)) (14) 


where Log() is a Logarithmic distribution. Furthermore, this 
poisson-logarithmic bivariate count distribution, p(X,l), can 
be expressed as 


X ~ NB(r,p), l ~ CRT(X,r) 


(15) 


where CRT denotes Chinese restaurant Table distribution. 
With THEOREM 2, the Eq. (1131) is also equal to 

n a d , k ~ NB(^L,p d ) 

A-d 


n d ,k ~ ^2 log{p d ), ld,k ~ Pois (■ M 1 - p d )) 
%, k ~CRT(nZ k ,^), n a d , k ~NB(^,p d ) 


Finally, we can update all n d k by, 


(16) 


( n dM > n dM ’ ■ ■ ■ > n d?K ) ~ Mult(n d , 


a i ai 

r dM r d,k , 


r = 2^1^ ' d,k 

a k 

(17) 

and for each word n in a document d, we can assign it to a 
topic k and author a by 

r d k 

p(z d ,n = k,i d , n = a) oc — 
r 

n d ,k = ^2s(z d ,n. = k) 

n 

S(zd,n = k AND i d ,n = a) 

d n 

(18) 

With these changes of variables, the original model is re¬ 
formulated as, 

7o ~ Gamma(e o, l//o) 

'ro,k\')o,CQ ~ Gamma(70/ K, l/c 0 ) 
p d ~ beta(a d , 0 , b d , 0 ) 
r a ,k\ro, c a ~ Gamma(r 0 , k , l/c a ) 
r a , k — r ai ,k ® r a ,2 ,k ©*** 
r d ,k\r a ,p d ~ Gamma(rt : k,Pd/{ 1 - Pd)) 


r d ,k ~ Gamma(^,pd/(1 - p d )), a G A d 
A d 

r a 
r d,k 


(19) 


z d ,n ~ Discrete ^——, ■ • •) 
r 

Tld,k = ^ ^ 3(Zd,n ~ &) 

n 

"-.* = ££ S(z d ,n = k AND i d} n = a) 

d n 

Tld,k ^ ^ ^ AND id,n 

n 

n a 

In the following, a Gibbs sampling algorithm is designed 
for the posterior inference and all the conditional distribu¬ 
tions are listed. 

Sampling z 


p(.Z d ,n — k, i d ,n — a| • • • ) OC Qk.n ' V d ,k 


( 20 ) 




Sampling r d 


Algorithm 1: Gibbs Sampler for I AT 


p(ra,k\ ■■■) oc Gamma{^- + n d}k ,p d ) (21) 

where n dk is the number of words in document d with author 
a and topic k. 

Sampling 1 d 


p(l a d , k \---)xCRT(nl k ,^j ( 22 ) 


Sampling p d 


r a , k = r ai ,k © r a2 ,k © • ■ • 


p(pd\ ■ • ■ ) oc Beta ( ao + ^ rid,k, bo + £<* (23) 

V k k ) 


p(rd,k\ • • •) oc Gamma(ri ik + n djk ,p d ) 

Sampling r a 

P(r a , k \•••) 

oc Gamma r 0 , k + l dk , 

\ d with a 


Ca- Ed with a A^-Hl-Pd) 


Sampling l a 


(24) 


p(L,k\ ■ ■ •) oc CRT ( Y2 ld,k,ro,k j (25) 

\d with a / 


Sampling r 0 ,k 


p(r 0 ,k | ■ • ■) oc Gamma 70 /A' + ^ l a ,k, 

\ a 


where 


Sampling l' k 


CO “EaM 1 -P*) 
(26) 


~Ed with g-k-Jn^-Pd) 

Ca-T.d withaTTM 1 -Pd) 


P{l'k\ • ' • ) °C CRT ^ l a , fc, 70 /^ 


Sampling 70 


p(7o| • • •) °c Gamma ( eo + 4, —- 

v k f °- 


ln(l — p') 


where 


- E a ^ n ( 1 - P «) 


co-E a ln< ^-Po) 

Sampling 6 k 

p(e k \---) oc H(e k )Y[o Zdin=kt „ 


(27) 


(28) 


(29) 


(30) 


(31) 


We can see from these conditional distributions that all of 
them are closed-form which is very easy to updated and im¬ 
plemented. Note that the sampling of the CRT distribution 
can be found in HZ). The whole procedure is summarized in 
Algorithm [l] 

Note that after we obtain all the samples of the posterior 
p(9, r a , r d , r 0 ,z d:n ,p d , 70 , n d ^\N d , AD, DN, e 0 , fo, c 0 , c a ,a 0 , b 0 ) 


Input: D, A, N, AD, DN 

Output: Kreal, {0} , {To.}, {v d } 
initialization; 
while iter < maxuer do 
for d = 1; d < D do 
for n = 1 ; n < N d do 
L Update z d ,n and i d , n by Eq. (l20b : 
for a = 1; a < A d do 

Update r d k by Eq. (l2lll : Update l d k by Eq. 

L dMJ; 

Update r d , k and p d by Eq. (El 
for a = 1; a < A do 

Update r a}k by Eq. (l24l) : Update l a ,k by Eq. 

L 

Update r 0}k by Eq. El i Update l' k by Eq. (1251) ; 
Update 70 by Eq. El; Update 6 by Eq. (l3Tll : 
iter + +; 

Identify K rea i ; 

Select the sample with largest likelihood and 
AT — A re al, 

return { 6 »}, {r a }, {r d }; 


Table 2: Statistics of Datasets 


Datasets 

D 

A 

N 

NIPS 

1,740 

2,037 

13,649 

DBLP 

28,569 

28,702 

11,771 


Table 3: Groups of Datasets DBLP 



D training 

D test 

A 

N 

group 1 

1,072 

319 

1,115 

3,783 

group 2 

1,071 

316 

1,094 

3,782 

group 3 

1,075 

305 

1,071 

3,788 

group 4 

1,076 

339 

1,104 

3,823 

group 5 

1,079 

310 

1,111 

3,841 


Table 4: Groups of Datasets NIPS 



D training 

D test 

A 

N 

group 1 

1,503 

237 

2,037 

5,110 

group 2 

1,495 

245 

2,037 

5,110 

group 3 

1,511 

229 

2,037 

5,110 


of latent variables and remove the burn-in stage, we firstly 
identify the topic number with largest frequency as the K rea i, 
and then find the sample with largest likelihood and K = 
K r eai from these samples. The output of Gibbs sampler are 
the latent variables 6, r a and r d in this sample. 

5. EXPERIMENTS 

In this section, we evaluate the proposed infinite author 
topic model (IAT), and compare it with the finite author- 
topic model (ATM) on different datasets. 

5.1 Datasets 

Two public datasets used in this paper are: 









































• NIPS paperf] This dataset contains papers from the 
NIPS conferences between 1987 and 1999. More de¬ 
scription can be found in the H 2 ; 

• DBLP paperf] The abstracts and authors of papers 
are extracted through DBLP interface from four ar¬ 
eas: database, data mining, information retrieval and 
artificial intelligence. More description can be found 
in the j 6 ]. 

Some statistics of two datasets are shown in Tabled For 
each dataset, we randomly select some documents as train¬ 
ing data and test data. The Table |T] and Table [3] show 
the selection results on two datasets. The number of se¬ 
lected training and test documents are specialized in column 
D training and column D test in Table [5] and [3] The re¬ 
quirements of selections is: the training and test documents 
must share some authors and some words. This requirement 
makes sure the learned topics and authors’ interests can be 
used to predict the test documents. 

5.2 Evaluation Metrics 

In order to evaluate the performance of the proposed model, 
we calculate the perplexity of the test documents using the 
learned topics and author interests on these topics. Per¬ 
plexity is widely used in language modeling to assess the 
predictive power of a model [ 20 J S. It is a measure of how 
surprising the words in the test documents are from the 
model’s perspective. It can be computed as, 

Perplexity = exp - ^p(w d |a d ) 

\ d ) (32) 

= exp I -^^p(w d | 6 'fe)p( 6 'fc|a d ) 

\ d k / 

where ad is the authors of test document d. The smaller 
the value of perplexity is, the better the predictive ability 
of a model has. Since we use the same test documents for 
different models, the normalization is not considered because 
it does not influence the model comparisons. 

Another evaluation metric is the training data likelihood, 

logLikelihood = ^ logp(w d |0, r a , r d ) ( 33 ) 

d 

This is a measure of the probability of the training document 
under the learned latent variables 9, r a and r d . It can be 
understood as ‘how the model fits the training data’. The 
bigger the value of likelihood is, the better a model fits the 
training data. 

5.3 Results Analysis 

For the DBLP dataset, the results are all shown in Fig. 
[4] Each row of the Fig. [4] denotes a group of DBLP dataset 
corresponding to Table [3] The left subfigures show the com¬ 
parison on the data log-likelihood. Here, we adjust differ¬ 
ent active topic numbers for the ATM, including K = 100, 
K = 200, K — 300, K = 400 and K = 500. From these sub- 
figures, the proposed IAT model (The hyper-parameters are 
set as following by experiences for the rest of this section: 
ao = 1, bo = 1 , eo = 1 , /o = 1 , Co = 1 and c a = 1 ) outper¬ 
forms the ATM on different preset topic numbers. It means 

1 http://www.datalab.uci.edu/author-topic/NIPs.htm 
2 http://www.cs.uiuc.edu/ libdeng/data/kdd 2011 .htm 


that IAT fits the training documents better than the ATM, 
and, more importantly, IAT does not depend the domain 
knowledge to predefine the active topic number, making the 
method widely applicable. 

The middle subfigures in Fig. [1] indicate the changing of 
active topics during the iteration of the IAT (The number 
of active topics is set as the number of training documents 
at the initialization step of the model). These curves show 
that the number of active topics dramatically drops down 
at the burn-in stage of the sampling, and began to stabilize 
after about 200 iterations. Since the documents are different 
in content but similar in numbers amongst the groups, the 
learned topic number is differ slightly amongst each others. 
These numbers are: group 1: K = 519; group 2: K = 332; 
group 3: K = 493; group 4: K = 465; group 5: K — 504. 

In order to show the effectiveness of the proposed model, 
we also compare the performances of two models (IAT and 
ATM) on the test documents prediction using perplexity in 
Eq. (1321) . Since the training and test documents share some 
authors, we can compute the perplexity of the test docu¬ 
ments according to the learned topics and authors’ interests 
on them. At each step of iterations, the perplexity of test 
documents is computed using the latent variables, { 0 }, { r a } 
and {rd}, at this iteration. The results are shown in right 
subfigures of Fig. [4] I 11 each subfigure, the first bar de¬ 
notes the mean of perplexities of all iterations except the 
burn-in stage (1 ~ 200 iterations) of the proposed model 
IAT and the others denote ATM with different (predefined) 
topic numbers. The standard deviations are also shown in 
the subfigures. The proposed model gets the best perfor¬ 
mance (smallest perplexity). The standard deviation of IAT 
is relatively bigger than ATM. The reason is because the 
number of active topics will change during the iteration but 
it will not change in ATM, so in theory, the random-walk 
space of Gibbs sampler of IAT can be larger than that of 
ATM. Even with this relatively larger standard deviation, 
the mean of perplexity of IAT is smaller than ATM. 

For the NIPS dataset, the results are all shown in Fig. 
0 Same with the DBLP dataset, the log likelihoods of IAT 
and ATM with different predefined active topic numbers are 
shown in the left side of the Fig. [5] Unsurprisingly, the sub- 
figuers in the middle column show the convergence of IAT 
(group 1: 367; group 2: 529; group 3: 354). Specially, we 
found that the log-likelihoods of ATM increases when topic 
number decreases. Therefore, we have compared with ATM 
with only two (the minimum number) topics as shown in 
the left subfigures in Fig. [5] It can be seen that the pro¬ 
posed IAT model also gets larger log likelihood and smaller 
perpetuity when compared with ATM except the case where 
ATM is set to have 10 topics in group 2. Even so, the ATM 
in group 2 with 10 topics has almost same performance with 
IAT on the Log-likelihood of training documents. Moreover, 
we can see that it takes 800 iterations to reach this stability 
for the ATM with 10 topics, but IAT only takes less than 50 
iterations to reach the same stability. 

6. CONCLUSIONS AND FURTHER STUDY 

We have developed an infinite author topic model that 
can automatically learn completely the latent features of the 
author-document-keywords hierarchy, which include hidden 
topics, authors’ interests on these topics and the number 
of topic from text corpora. The stochastic processes are 
adopted instead of the fixed-dimensional probability distri- 
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Figure 4: Results from IAT and ATM on five groups of DBLP dataset. Each row denotes a group. In each row, the left 
subfigure shows the Log-likelihoods comparison between IAT and ATM with different (predefined) topic numbers: K = 100, 
K = 200, K = 300, K = 400, and K = 500; The middle subfigure shows the change of active topic number of IAT during the 
iteration of Gibbs sampling; the right subfigure shows the perplexity comparison between IAT and ATMs. 
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Figure 5: Results from IAT and ATM on three groups of NIPS dataset. Each row denotes a group. In each row, the left 
subfigure shows the Log-likelihoods comparison between IAT and ATM with different (predefined) topic numbers: K = 2, 
K = 10, K = 100, K = 200, K = 300, K = 400, and K = 500; The middle subfigure shows the change of active topic number 
of IAT during the iteration of Gibbs sampling; the right subfigure shows the perplexity comparison between IAT and ATMs. 


























































































































butions. The model uses a mixed author gamma process as 
the base measure of the document gamma process to cap¬ 
ture the author-document mapping. We have demonstrated 
that the designed Gibbs sampling algorithm can be used to 
learn such infinite author topic model based on the various 
real-world datasets. 

Other potential applications of this work include multi¬ 
label learning [25]: The ‘authors’ in the proposed model 
can be seen as labels, and the inference of the model can be 
seen as the training of the multi-label classifier. The learned 
topics can be seen as having infinite features space. This is 
our further study. 
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